12 research outputs found

    Enabling preemptive multiprogramming on GPUs

    Get PDF
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft

    CUsched: multiprogrammed workload scheduling on GPU architectures

    Get PDF
    Graphic Processing Units (GPUs) are currently widely used in High Performance Computing (HPC) applications to speed-up the execution of massively-parallel codes. GPUs are well-suited for such HPC environments because applications share a common characteristic with the gaming codes GPUs were designed for: only one application is using the GPU at the same time. Although, minimal support for multi-programmed systems exist, modern GPUs do not allow resource sharing among different processes. This lack of support restricts the usage of GPUs in desktop and mobile environment to a small amount of applications (e.g., games and multimedia players). In this paper we study the multi-programming support available in current GPUs, and show how such support is not sufficient. We propose a set of hardware extensions to the current GPU architectures to efficiently support multi-programmed GPU workloads, allowing concurrent execution of codes from different user processes. We implement several hardware schedulers on top of these extensions and analyze the behaviour of different work scheduling algorithms using system wide and per process metrics.Postprint (published version

    Polilaktid kao komponenta ekološki prihvatljivih kompozitnih materijala

    Get PDF
    The biodegradable linear aliphatic thermoplastic polyester poly(L-lactide) (PLA) is producible from agricultural products, such as corn. This polymer has been widely used as a biocompatible material for applications in surgical suture, medical implants and controlled drug delivery. Owing to its good mechanical properties and versatile fabrication processes the PLA has tremendous potential in traditional applications such as food packages, industrial devices, fibers, and green composites The goal of this work was to modify the mechanical properties of composite materials based on different PLA types and silica nanoparticles using thermoplastic elastomer.Biodegradabli linearni alifatični termoplastični poliestar poli(L-laktid) (PLA) se dobija iz poljoprivrednih proizvoda kao što su kukuruz ili šećerna repa. Ovaj polimer se intenzivno upotrebljava kao biokompatibilni materijal za primene kao što su hirurški konci, medicinski implantati i sistemi za kontrolisano otpuštanje lekova. Zahvaljujući dobrim mehanickim svojstvima i mogućnostima različitih postupaka prerade, PLA ima ogroman potencijal u tradicionalnim primenama kao sto su ambalaža za hranu, industrijska oprema, vlakna i zeleni kompoziti. Cilj ovog rada je bio da se primenom termoplastičnog elastomera modifikuju mehanička svojstva kompozitnih materijala na osnovu različitih tipova PLA (za ekstruziju, za duvane filmove, za biaksijalno orijentisane filmove) i nano čestica silicijum dioksida

    Razvoj postupaka polimerizacije L-laktida

    Get PDF
    To determine the appropriate conditions for the polymerization of (Llactide), to obtain poly(L-lactide) a few different methods were applied: in closed vials under vacuum, in the reactor under high pressure, in the microwave reactor and in a reactor with solvent using the initiator. The molecular masses of prepared samples were determined using GPC method. It was assessed that by microwave synthesis method for the polymerization time less than 30 minutes the resulting polymer have the highest molecular mass, 178.000 g mol-1. It was estimated that the samples synthesized with trifluoromethanesulfonic acid as initiator have the best thermal stability.Za određivanje optimalnih uslova za polimerizacije (L- laktida), za dobijanje poli (L-laktida) (PLLA) primenjene su različite metode polimerizacije: u zatvorenim posudama pod vakumom, u reaktoru pod visokim pritiskom, mikrotalasnom polju i u rastvoru sa inicijatorom. Za određivanje molekulskih masa korišćena je GPC metoda. Ustanovljeno je da je mikrotalasna sinteza postupak sa najkraćim vremenom polimerizacije (manjim od 30 minuta) pri čemu je nastaje polimer koji ima najveću molarnu masu 178.000 g mol-1. Najbolju termičku stabilnost imao je uzorak PLLA sintetisan sa trifluorometansulfonskom kiselinom kao inicijatorom

    Efficient exception handling support for GPUs

    No full text
    Operating systems have long relied on the exception handling mechanism to implement numerous virtual memory features and optimizations. However, today's GPUs have a limited support for exceptions, which prevents implementation of such techniques. The existing solution forwards GPU memory faults to the CPU while the faulting instruction is stalled in the GPU pipeline. This approach prevents preemption of the faulting threads, and results in underutilized hardware resources while the page fault is being resolved by the CPU. In this paper, we present three schemes for supporting GPU exceptions that allow the system software to preempt and restart the execution of the faulting code. There is a trade-off between the performance overhead introduced by adding exception support and the additional complexity. Our solutions range from 90% of the baseline performance with no area overheads, to 99.2% of the baseline performance with less than 1% area and 2% power overheads. Experimental results also show 10% performance improvement on some benchmarks when using this support to context switch the GPU during page migrations, to hide their latency. We further observe up to 1.75x average speedup when implementing lazy memory allocation on the GPU, also possible thanks to our exception handling support.We would like to thank anonymous reviewers, Lluis Vilanova and Javier Cabezas for their help in improving this paper. Early discussions with Steve Keckler, Arslan Zulfiqar, Jack Choquette and Olivier Giroux had a major influence on this work, for which we are very grateful. This work is supported by Nvidia through the GPU Center of Excellence program, the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), the Spanish Ministry of Science and Technology (TIN2015-65316-P) and by the Generalitat de Catalunya (grants 2014-SGR-1051 and 2014-SGR-1272). Nacho Navarro passed away before this paper was published. This work would have not been possible without his guidance, support, and dedication. A memory of him will always live in his students, colleagues and loved ones.Peer ReviewedPostprint (published version

    Strain determination of self-adhesive resin cement using 3D digital image correlation method

    Get PDF
    Introduction/Objective In an attempt to simplify dental procedures, a new group of resin cements, self-adhesive resin cements (SARCs), have been introduced. Performance of SARCs can widely vary. One of the main reasons of adhesion failure is polymerization shrinkage. The aim of this study was to determine, evaluate, and measure strain field of self-adhesive dual cure resin cement during polymerization in self-cure mode using 3D digital image correlation (DIC) method. Methods The self-adhesive Maxcem Elite (Kerr, Orange, CA, USA) cement was tested in five cylindrical samples (5 mm in diameter and 2 mm in thickness) prepared by filling plastic ring-type molds. Digital images were recorded immediately after sample preparation. Results Non-uniform strain distribution was found in resin cement with higher strain values along the periphery (up to 15%) and lower strain values in central parts (around 4%) of each sample. Conclusion It can be concluded that DIC is a powerful tool for full-field strain measurements in material characterization.Uvod/Cilj U nastojanju da se pojednostave stomatološke procedure uvedene su nove grupe kompozitnih cemenata - samoadhezivni, dvojnovezujući kompozitni cementi. Svojstva ovih cemenata mogu da variraju u velikoj meri. Jedan od glavnih razloga narušavanja adhezionih svojstava kompozitnih cemenata je polimerizacijska kontrakcija. Cilj ovog rada je da se odredi i izmeri deformaciono polje samoadhezivnog dvojnovezujućeg kompozitnog cementa tokom hemijske polimerizacije korišćenjem eksperimentalne tehnike - metode 3D korelacije digitalnih slika. Metode Samoadhezivni kompozitni cement Maxcem Elite (Kerr, Orange, SAD) ispitivan je na pet uzoraka prečnika 5 mm, debljine 2 mm, koji su pripremljeni punjenjem plastičnih kalupa prstenastog oblika. Digitalne slike su zabeležene neposredno posle pripreme uzoraka. Rezultati Neuniformno deformaciono polje je pokazalo veće vrednosti deformacija na periferiji uzoraka (do 15%) i manje vrednosti deformacija u centralnim delovima uzoraka (oko 4%). Zaključak Može se zaključiti da je 3D korelacija digitalnih slika precizna i pouzdana metoda za merenje deformacionih polja u oblasti karakterizacije materijala

    Enabling preemptive multiprogramming on GPUs

    No full text
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer Reviewe

    Synthesis and properties of novel star-shaped polyesters based on l-lactide and castor oil

    No full text
    The topology of biodegradable polyesters can be adjusted by incorporating multifunctional polyols into the polyester backbone to obtain branched polymers. The aim of this study was to prepare the biodegradable-branched polyester polyols based on l-lactide and castor oil using the trifluoromethanesulfonic acid as a catalyst. FTIR and H-1 NMR spectroscopy measurements were used to estimate the molecular structure of the novel materials. The polyester polyol was synthesized by core-first method which involves a polymerization of l-lactide by using a castor oil as multifunctional initiator. Molar masses estimated by gel permeation chromatography and vapor pressure osmometry were in good correlation with calculated values based on hydroxyl number of obtained polymers. DSC measurements confirmed high crystallinity degree of the synthesized material. It was assessed that the molar masses of obtained polymers-influenced glass transition temperature significantly. The thermal stability was investigated by TG analysis, and the results have shown the dependence of weight loss on the arm length of the star-shaped polyesters. The thermal stability of star-shaped polyesters significantly decreased with degradation of polyester polyol obtained in acid solution

    Comparison based sorting for systems with multiple GPUs

    No full text
    As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are designed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single-GPU sorting algorithm. Then, a series of merge steps produce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.Peer ReviewedPostprint (published version

    Comparison based sorting for systems with multiple GPUs

    No full text
    As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are designed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single-GPU sorting algorithm. Then, a series of merge steps produce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.Peer Reviewe
    corecore